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RESEARCH  OBJECTIVES 


During  the  period  of  the  grant  (December  1,  1987  -  November 
30,  1990),  the  major  research  objectives  have  been  to  perform  both 
analytical  and  experimental  investigations  of  optical  quadratic 
neural  networks.  The  major  areas  of  investigation  have  been  (ij 
an  electro-optical,  outer-product-based  architecture  for  quadratic 
neural  networks  employing  polarization  encoding  and  utilizing  LCTV 
modulators?  (2)  optical  implementations  of  quadratic  neural 
networks  using  photorefractive  BaTiC>3  crystals  to  perform  the 
required  vector-matrix-vector  quadratic  operations  and  using  LCTV 
light  modulators  to  facilitate  updating;  (3)  theoretical/computer 
analyses  and  simulations  of  the  characteristics  of  linear  and 
higher  order  Hebbian-type  neural  networks;  and  (4)  techniques  for 
reducing  the  proliferation  of  weighting  terms  in  higher  order 
neural  networks.  Details  of  the  investigations  are  presented  in 
the  following  sections  and  in  the  publications  referenced. 


SUMMARY  OF  RESULTS 


In  consideration  of  the  large  number  of  journal  publications 
and  conference  proceedings  resulting  from  this  research,  we  will 
briefly  summarize  the  major  results  obtained  in  this  section,  with 
references  made  to  the  appropriate  publications. 

1 .  Electro-Optical  Implementation  of  a. Weighted  Outer  Product 
Processor  Using  Polarization  Encoding 

This  project  investigated  the  design  and  applications  of  a 
weighted  outer  product  processor  based  on  polarization  encoding 
techniques.  The  processor  implements  a  general  quadratic 
polynomial  with  bipolar  coefficients  as  the  outer  product  of  a 
vector  &  followed  by  a  generalized  inner  product  with  a  matrix  of 
weights  (coefficients  of  the  polynomial)  J£.  The  result  is 
obtained  as  the  element-by-element  multiplication  between  the 
outer  product  matrix  and  the  weight  matrix  followed  by  spatial 
integration.  The  architecture  is  shown  in  Fig.  1. 

Here,  the  LCTV  modulators  1  and  2  perform  the  outer  product, 
while  LCTVs  3  and  4  perform  the  weighting  on  the  outer  product 
matrix.  The  outputs  are  detected  by  the  photodiodes  and  are  read 
into  a  computer  where  the  information  can  be  used  to  update  the 
weights,  as  in  the  case  of  an  iterative  network.  By  using  the 
properties  of  the  polarization-encoding  technique,  we  have  been 
able  to  reduce  the  space-bandwidth  product  to  one-fourth  of  what 
would  normally  be  required  in  systems  operating  with  bipolar 
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Fig.  1.  Polarization-encoded  weighted  outer  product  processor. 


numbers.  A  second  system,  with  only  one  LCTV  for  the  outer 
product  and  one  for  the  weights,  was  also  designed  and  tested. 

This  system  traded  off  increased  electronic  preprocessing  for 
simpler  optics. 

Among  the  applications  investigated  using  the  outer  product 
processor  was  a  quadratic  neural  network  with  learning.  The 
network  was  tested  on  problems  such  as  the  familiar  Exclusive-OR 
problem  and  pattern  recognition  problems.  As  expected,  it 
demonstrated  a  superior  performance  when  compared  with  linear 
neural  nets.  The  polarization-encoded  optical  outer  product 
processor,  which  has  also  been  shown  capable  of  optical  polynomial 


evaluations,  was  also  applied  to  generate  Walsh  and  Haar 

transforms  and  optical  logic.  System  characteristics  such  as 

throughput,  speed,  resolution,  dynamic  range,  and  cascadability 

2 

were  also  investigated. 
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2 .  Optical  Quadratic  Neural  Networks  Using  Barium  Titanate 

A.  Optical  Quadratic  Processing  Units 

Because  quadratic  neural  networks  have  a  number  of  advantages 

(i.e.  increased  capacity  and  increased  learning  rate)  over  linear 
3 

neural  networks,  an  optical  quadratic  neural  network  has  been 
designed  and  successfully  implemented.  The  network  uses  the 
quadratic  decision  function  (in  vector-matrix-vector  form) 

y  =  xT  W  x  ,  (1) 

where  W  is  the  interconnection  weight  matrix  and  x  is  the  neural 
network  input  vector.^ 

Optical  four-wave  mixing  in  electrooptic  barium  titanate 
(BaTi03)  was  used  to  accomplish  the  required  multiplication 
operation.  The  optical  properties  of  BaTi03  were  characterized  in 
our  laboratory  by  Otto  Spitz  in  his  Master's  thesis.5  Afterwards, 
Greg  Henderson  applied  the  photoref ractive  property  of  the  crystal 
to  implement  the  result  of  Equation  (1)  optically. ^ 

The  process  of  four-wave  mixing  is  shown  in  Fig.  2.  Three 
laser  beamlets  are  incident  onto  the  BaTi03  crystal.  The  x  matrix 
is  encoded  onto  one  pump  beam,  and  the  x  matrix  is  encoded  onto 
the  counterpropagating  pump  beam.  The  W  weight  matrix  is  encoded 
onto  the  probe  beam.  The  output  is  a  phase-conjugate  beam  which 
is  proportional  to  the  product  of  the  three  incident  beams.  With 
a  one-neuron  processor,  a  4-element  input  vector,  and  a  4  x  4 
binary  weight  matrix,  the  system  was  able  to  achieve  13.8  dB  power 
signal-to-noise  ratio.  Then,  angle  multiplexing  was  used  to 
increase  the  number  of  neurons.  However,  the  power  signal-to- 
noise  ratio  was  reduced  to  9.54  dB .  Thus,  spatial  multiplexing 
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was  chosen  over  angle  multiplexing  for  increasing  the  number  of 
neurons  that  can  be  packed  inside  a  single  BaTiOi  crystal  . 


Fig.  2.  Fuai-w u</e  mixing  in  Barium  Titanate  (ba'n03)  .  The  X 
and  XT  matrices  are  encoded  on  the  counterpropagating 
pump  beams  while  the  W  matrix  is  encoded  on  the  probe 
beam.  The  phase-conjugate  output  is  summed  by  the  lens 
to  generate  the  result  XTWX . 


B.  Optical  Quadratic  Perceptron  Neural  Network 

This  project,  completed  by  Alex  Huynh,  investigated  a  closed- 
loop  neural  network  constructed  from  the  previously  discussed 
optical  quadratic  neurons.  The  quadratic  network  can  dramatically 
increase  the  speed  of  convergence  because  the  inputs  are  pre¬ 
correlated  in  pairs  before  being  introduced  to  the  single  layer  of 
processing  neurons.  An  operational  quadratic  neural  network  with 


a  feedback  path  has  been  successfully  realized  after  some 

electronics  (a  computer,  video  cameras,  scanrate  converters,  et 

were  interfaced  with  the  optics.  For  training  the  neural  network, 

the  popular  Perceptron  learning  algorithm  in  its  quadratic  form 
7 

was  chosen.  To  implement  this  algorithm,  each  neuron,  compose! 
of  both  positive  and  negative  polarities,  is  spatially  multiplexed 
onto  the  BaTi03  crystal . 

The  two-neuron  architecture  developed  by  Alex  Huynh  is  shewn 
in  Fig.  3.  As  previously  discussed,  the  vector-mat rix-vector 
quadratic  product  operation  is  performed  by  four-wave  mixing  in 
BaTi03.  The  phase-conjugate  output  emerges  from  the  crystal  and  is 
reflected  by  a  beamsplitter  (BS3)  onto  the  charge-coupled  device 
(CCD)  camera  and  then  digitized  into  the  computer.  To  allow  easy 
modifications,  the  input  and  interconnection  matrices  are  now 
generated  on  a  Macintosh  computer  and  written  to  a  monochrome 
liquid-crystal  television  (LCTV) .  The  computer's  tasks  also 
include  thresholding  the  BaTiCd's  output,  comparing  this 
thresholded  output  with  a  specified  target  value,  and  altering  the 
interconnection  matrix  accordingly.  The  neural  network  iterates 
until  convergence  is  achieved. 

During  the  performance  tests,  the  system  was  configured  as 
two  bipolar  neurons  of  3  x  3  elements  each.  The  network  converged 
to  the  desired  output  values  within  50  iterations  for  all  2-bit 
target  permutations.  Although  two  neurons  were  tested,  the  number 
of  neural  processors  that  can  be  placed  inside  a  single  barium 
titanate  crystal  has  been  experimentally  increased  to  four  and  may 
be  theoretically  increased  up  to  the  number  of  pixels  of  the  LCTV. 


Collimator 


Fig.  3.  B'lTJ  03 -based  qundr.it.  ic  not  work  with  looming . 


In  summary,  an  operational  optical  quadratic  Perceptron  neural 
network  has  been  developed  with  the  capabilities  to  learn  and 
classify  binary  patterns. 


3 .  Determination  of  Hopfield  Associative  Memory  Characteristics 

A.  Introduction 

O 

It  has  been  previously  shown  by  Hopfield  that  associative 

memories  based  on  the  Hopfield  neural  network  model  (which  we  call 

first  order  Hopfield  associative  memories,  or  HAMs)  are  capable  of 

storing  information,  usually  specified  by  N-dimensional  binary 

vectors,  in  a  distributive  way  as  well  as  recalling  the  complete 

stored  information  when  presented  with  a  noisy  input  (error- 

correction)  .  Through  a  "learning"  process  utilizing  the  sum  of 
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outer  products  algorithm,  information  is  learned  as  stable  stares 
in  the  self-feedback  Hopfield  network.  The  recall  process  starts 
with  a  probe  input  and  will  iterate  until  a  stable  state  is 
reached.  The  number  of  storable  stable  states,  however,  is 
limited  by  the  number  of  neurons  (N)  and  the  degree  of  error- 
correction  desired.  Previously  this  issue  of  memory  capacity  had 
been  investigated,  based  on  the  '  sumption  of  N— »°°,  by  numerous 

researchers.'*’®'  ^  Thus,  if  N  is  small  the  error  in  their 
capacity  predictions  may  be  significant.  In  addition,  no  specific 
procedure  or  proof  for  calculating  the  attraction  radii  of  HAMs 
has  been  given  in  the  past. 

During  this  funding  period,  a  new  statistical  method  using  a 
single  signal-to-noise  parameter  (which  we  call  C)  was  developed 
as  a  tool  to  study  both  first  order'1'*'  and  higher  order 

1  zr  i  n  -i  q 

HAMs.  '  '  The  C  parameter  method,  which  emphasizes  the 

importance  of  specifying  the  required  network  convergence 
probability,  is  found  to  characterize  the  scaling  properties  of 


HAMs  more  accurately  and  efficiently  than  other  researchers' 
methods . 10' 11' 12 

B.  Mathematical  Formulation 

Given  N,  M  (the  number  of  stored  vectors) ,  and  b  (the  number 
of  error  bits  in  the  probe  vector),  the  C  parameter  for  a  Pth  order 
HAM  is  found  to  be 


The  probability  that  a  neuron  will  hold  an  incorrect  bit  after  a 
single  update  cycle  is  calculated  as 


The  characteristic  of  these  two  equations  is  that  they  are 
invariant  in  their  form  as  p  changes.  Once  the  values  of  N,  M,  b, 
and  p  are  given,  the  value  of  C  is  determined  and  rj  can  be 

calculated  accordingly.  Finally,  with  the  approximate  joint 
independence 1 which  exists  among  all  neurons  when  the  memory 
performs  its  update  synchronously,  the  value  of  T)  solely 

determines  the  convergence  probability  of  the  network.  Thus,  the 
essential  ingredient  of  the  C  parameter  method  is  the  property  of 


1-to-l  mapping  of  the  C  parameter  to  the  convergence  probability 
of  HAMs . 

C.  Applications 

With  the  aid  of  the  C  parameter,  we  obtained  the  following 
key  scaling  properties  for  various  versions  of  the  HAM. 

(1)  The  memory  capacity  and  the  attraction  radius  of  the 
direct  convergence  (one-step)  HAM,  in  which  the  initial  vector  is 
required  to  precisely  converge  to  the  stored  vector  in  one 
iteration,  can  be  predicted.  It  has  been  shown  that,1*1'14  when 
storing  M'  vectors,  where  M'  <  the  memory  capacity  M,  the  Hopfield 
network  will  likely  converge  to  a  stored  vector  after  the  first 
iteration.  Therefore,  the  capacity  derived  for  the  direct 
convergence  HAM  when  N-*°o  is  also  the  asymptotic  capacity  for  the 

Hopfield  network. 

(2)  The  memory  capacity  and  the  attraction  radius  of  the 
indirect  convergence  HAM,  in  which  a  specified  percent  error  e  is 

allowed  after  multiple  iterations,  can  be  computed.  Both  the 
cases  of  first  order1'*'  14,  15  and  higher  order  HAMs1^'  1^' 1®  were 
investigated  for  (1)  and  (2)  . 

(3)  In  (1)  and  (2)  we  also  showed,  using  the  close  tie 
between  convergence  probability  and  the  C  parameter,  that  given  a 
fixed  convergence  probability,  the  memory  capacity  can  be  traded 
for  an  increase  of  attraction  radius,  or  generalization 
capability,  and  vice  versa. 

(4)  Figures  of  merit  for  the  performance  of  HAMs  have  been 
formulated.1'*  Two  statistical  parameters  that  can  be  used  to 
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determine  the  performance  of  arbitrary  order  HAMs  were  developed. 
The  principle  involves  using  two  figures  of  merit,  E/T)  and  TIN,  to 

determine  the  convergence  probability  for  indirect  convergence  and 
direct  convergence  HAMs  described  in  (1)  and  (2)  .  Given  T|,  the 
parameter  £/T|  determines  the  capability  of  converging  iteratively 
to  at  most  £N  bits  away  from  the  stored  vector,  where  0<  £<0.5.  We 

showed  that  the  indirect  convergence  probability  Pic=1.0  for  all 
HAMs  having  e/Tl>20.  On  the  other  hand,  if  precise  convergence  to 
the  stored  vector  in  one  step  is  required,  the  parameter  tin  is 

used  to  determine  the  probability  of  direct  convergence,  Pdc . 

Since  these  two  parameters,  E/TJ  and  T)N,  can  determine  the 

convergence  probability  of  the  indirect  convergence  and  direct 
convergence  HAMs,  they  in  turn  can  determine  the  memory  capacity 
for  these  HAMs.  This  argument  means  that  in  the  case  of  indirect 
convergence,  given  any  two  of  the  parameters  E,  M  or  N,  we  can 

find  the  third  parameter.  Similarly,  for  the  case  of  direct 
convergence,  given  any  two  of  the  parameters  PdCf  M  or  N,  we  can 
find  the  third  parameter. 

(5)  Unique  characteristics  of  HAMs  with  nonzero-diagonal 
terms  in  the  memory  matrix  (which  we  call  NZAM)  were  determined. 

We  applied  the  C  parameter  method  in  studying  the  unique 
characteristics  of  this  special  version  of  the  HAM.  It  has  been 
shown  that  for  an  outer-product  type  network,  e.g.,  the  one 

O 

investigated  by  Hopfield  ,  the  ratio  M/N=a  holds  for  small  a's, 
e.g.,  a=0.15.  As  in  the  HAM,  the  memory  matrix  of  the  NZAM  is 
constructed  to  store  M  vectors  based  on  the  outer-product  learning 
algorithm,  but  all  the  diagonal  terms  of  the  memory  matrix  are  set 
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to  be  M.  Assuming  the  input  error  ratio  p=0,  we  theoretically 

20  21 
proved  a  surprising  simulation  result  by  Stiles  et  al .  that  in 

the  NZAM  the  probability  of  successful  recall,  Pr,  steadily 

decreases  as  a  increases,  but  as  a  increases  past  1.0,  Pr  begins 

to  increase  slowly.  In  particular,  if  Pr  is  expressed  as  a 

function  of  a,  there  exist  double  roots  Cti  and  CC2  such  that 

ai<X2=l  and  Pr  (Oti)  =Pr  (<X2)  .  This  is  unique  in  the  sense  that  it 

occurs  only  in  the  NZAM  of  first  order. 

Even  when  0<p<0.5,  the  NZAM  is  unique  in  its  own  way  and 

results  in  special  network  behavior  due  to  the  nonzero  diagonal 
terms.  The  property  Pr  (ai)  =Pr  (oq?  >  ,  however,  no  longer  exists  in 
this  case.  When  0<p<0.5,  the  network  exhibits  strong  error- 
correction  capability  if  a<0.15  and  this  capability  is  shown  to 
rapidly  decrease  as  a  increases.  The  network  essentially  loses 
all  its  error-correction  capability  at  a=2,  regardless  of  the 
value  of  p.  In  the  extreme  case  of  a»l,  the  network  acts  like  an 

all-pass  filter  and  the  number  of  stable  states  increases  to  2N. 
When  0<p<0.5,  and  under  the  constraint  of  Pr>0.99,  the  tradeoff 

between  the  number  of  stable  states  and  their  attraction  force  was 
analyzed  and  the  memory  capacity  was  shown  to  be  0.15N  at  best. 
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4 .  Reduction  of  Interconnection  Weights  in  Higher  Order  HAMs 

The  main  goal  of  this  project  was  to  apply  the  C  parameter 
technique  to  eliminate  redundant  interconnection  weights  in 
higher  order  HAMs.  It  has  been  shown  that  the  memory  capacity  of 
higher  order  HAMs  can  increase  rapidly  as  the  order  of  the  network 

o  p 

p  increases.  '  But  the  problem  with  higher  order  HAMs  is  that 

the  number  of  interconnection  weighting  terms  also  increases  very 

rapidly  with  the  number  of  inputs  N  and  the  order  p,  and  it 

becomes  unacceptably  large  for  use  in  many  situations.  In 

general,  the  number  of  independent  weighting  terms  in  the  pth  order 

expansion  is  approximately  (NP+1)/p!.  Previous  techniques  for 

dealing  with  this  proliferation  problem  involve  using  a  priori 

knowledge  of  the  problem  domain  to  eliminate  the  terms  which  have 

?  3 

a  small  likelihood  of  being  useful.  This  prelearning  method 
produces  specialized  networks  which  are  useful  in  a  limited 
domain,  such  as  geometric  invariance  in  pattern  recognition.  As 
for  the  case  of  HAMs,  no  efficient  method  for  tackling  the 
proliferation  problem  has  previously  been  seen. 

During  this  funding  period,  we  showed  that  among  all 
connection  weights  T  based  on  the  outer-product  rule,  -M  <  T  <  M, 
principal  weights  called  Tpr,  Vm< | Tpre  T | <M,  carry  more  information 
17 

than  the  others.  We  proved  that  HAMs  using  only  these  principal 
weights  are  capable  of  achieving  good  recall  results.  Using  only 
Tpr  weights  can  result  in  a  savings  of  more  than  50%  of  the 
original  number  of  connection  weights. 

We  have  proposed  a  3-layer  neural  network  that  explores  the 
advantages  of  the  principal  weights  described  above.  The  proposed 


network  includes  (1)  an  input  layer,  (2)  a  hidden  layer  that 
contains  product  units, ^  and  (3)  an  output  layer  that  contains 
ordinary  sigmoidally-thresholded  summing  units.  The  network 
operations  consist  of  three  phases:  (1)  preprocess  the  prescribed 
associative  vectors  and  select  the  principal  weights  Tpr;  (2) 
create  the  required  number  of  product  units  and  interconnections 
according  to  the  results  obtained  in  (1);  and  (3)  train  the 
network  using  the  backpropagation  learning  algorithm  until  high 
memory  recall  accuracy  is  achieved.  We  are  definitely  encouraged 
by  the  results  obtained  to  date  on  this  network,  in  particular  the 
tremendous  improvement  in  the  training  speed  and  the  efficient 
implementation.  We  believe  that  further  investigation  of  this 
network  is  of  great  interest  in  the  field  of  associative  memory 
and  pattern  recognition. 
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4 .  Developed  a  signal-to-noise  ratio  parameter-based  performance 
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in  HAMs  while  retaining  their  memory  storage  and 
generalization  capabilities. 


